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20.  ABSTRACT  (continued) 


rRecent  text  generation  research  resembles  recent  research  in  synthesis  of  vaccines.  The  research  is 
designed  to  construct  entities  which  previously  arose  naturally.  This  constructive  approach  creates 
practical  and  theoretical  benefits. 

Our  text  generation  research  has  produced  a  large  systemic  English  grammar,  which  is  embedded  in 
a  computer  program.  This  grammar,  which  is  called  Nigel,  generates  sentences.  It  is  controlled  by  a 
semantic  stratum  which  has  been  added  to  the  basic  systemic  framework. 

This  paper  describes  the  program,  which  also  is  called  Nigel,  ft  identifies  augmentations  of  various 
precedents  in  the  systemic  framework,  and  it  indicates  the  current  status  of  the  program.  The  paper 
has  a  dual  focus.  First,  on  Nigel's  processes,  it  describee  the  methods  Nigel  uses  to  control  text  to 
fulfill  a  purpose  by  using  its  new  semantic  stratum.  Second,  concerning  Nigel’s  Interactions  with  its 
environment,  it  shows  reasons  why  Nigel  is  easily  embedded  in  a  larger  experimental  program. 

Although  the  paper  does  not  focus  on  Nigel's  syntactic  scope,  that  Ms  scope  is  non  trivial  is  Indicated 
by  the  fact  that  alt  of  the  sentence  and  dauae  structures  of  this  abstract  are  within  that  syntactic 
scope.  ^ 
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1.  Progress  in  Immunology:  Synthetic  Vaccines 

hi  February  of  (hit  year,  Scientific  American  described  a  breakthrough  in  medcaf  acienoa,  the 
laboratory  eyntheeie  of  vaednae  against  flu  and  other  virua  diaaaaaa  [Lamar  83].  Ever  since  vaccines 
were  conceived,  research  on  vaccines  and  Immunology  has  focused  on  natural  substances  and  their 
enecu.  now  n  nee  Dsconis  pososote  10  oyt  luieeizo  ^eccfnso  w  ngnx  irumy  oofvwnofi  oinmos* 

In  synthesizing  vaccines,  scientists  had  to  supplement  the  established  methodologies.  In 
particular,  they  had  to  develop  methods  of  vaccine  construction  to  supplement  existing  methods  of 
vaccine  Identification  and  description. 

When  the  work  on  synthesis  began,  there  was  a  substantial  amount  of  evidence  to  suggest 
that  the  task  would  be  overwhelmingly  complex,  and,  in  a  practical  sense,  Impossible.  The  Scientific 
American  article  describes  how  they  did  it  They  worked  with  models  at  the  molecular  level,  a  liner 
level  of  detail  than  that  of  most  of  the  preceding  work.  One  of  the  key  stages  in  the  process  came 
when  they  discovered  how  to  cause  synthesized  vaccine  elements  to  express  the  many  abstract 
attributes  by  which  vaccines  had  previously  been  described  (without  reference  to  the  molecular 
level.) 

"[We  found  foat]  ..synthetic  peptides  can  mimic  the  distinctions  revealed  by  serologic 
studies;  in  designing  synthetic  vaccines,  one  wM  be  able  to  take  advantage  of  serologic 
evidence..." 

The  key  evidence  of  success,  of  course,  is  that  the  vaccines  work.  Out  of  voluminous  and 
dataled  reasoning  sbout  molecules,  their  shapes,  and  their  interactions,  come  chemicals  that 
actuaffy  prevent  the  infections  that  they  are  supposed  to  prevent  The  success  of  the  vaccines 
valkfatas  the  work  at  the  molecular  level,  and  also  validates  the  particular  theories  at  higher  levels  on 
which  it  was  based.  When  a  synthesized  vaccine  is  ineffective,  it  points  to  inadequacies  in  the 
supporting  theorise. 

So,  work  aimed  at  creating  synthetic  vaccines  has  also  created  a  powerful  new  tool  for 
scientific  inquiry.  Ns  worth  can  be  fully  justified  on  its  scientific  benefits  alone,  or  on  its  practical 
banaflts  alone. 

2.  Penman:  Constructive  Research  in  Linguistics 

The  work  described  in  this  paper  is  another  variety  of  scientific  work  in  a  constructive 
methodology.  In  this  case,  the  Kerns  being  constructed  are  texts  rather  than  vaccines,  and  the  tost  of 
effectiveness  involves  reading  the  texts  rather  than  resisting  a  disease.  Despite  the  differences,  the 
work  on  synthetic  vaccines  helps  us  to  understand  current  work  on  toxtgonoratlon. 

In  text  generation,  the  focal  task  is  to  create  a  text,  in  fluent  natural  language  (often  English), 
in  response  to  tome  particular  need.  The  trick  is  to  do  so  using  only  the  expUck  knowledge  of 
language  specified  by  one's  linguistic  theories,  rather  than  using  foe  aktits  of  a  person.  Research  of 
Me  sort  has  been  carried  out  in  a  scattered  fashion  for  over  a  decade,  generaNy  using  a  computer  as 
the  site  of  eyntheels  and  as  the  repository  of  foe  most  detailed  level  of  theory  [Devey  78,  Mann 
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61,  Mam  &  Moore  80,  McKaown  82,  Mam  ft  Matthiaaaan  83a).  Computare  hare  fM  tha  roia  of 
molecular  genetics  for  the  vaccine  work:  they  are  a  relatively  new  technology,  enabfing  foe  work  in  a 
practical  sense  but  not  in  any  way  fundamental  to  It 

Text  generation  race  arch  uses  a  constructive  methodology,  but  it  ia  vitally  dependant  on  prior 
descriptive  work.  Because  it  is  a  constructive  approach,  voluminous  detail  is  required;  descriptive 
research,  with  its  heavy  reliance  on  processes  of  abstraction,  eliminates  details.  Descriptive  research 
can  proceed  in  several  independent  directions  to  unreconciled  conclusions;  constructive  research 
requires  extensive  reconciliation  of  the  contributing  theories.  Just  as  for  the  vaccines,  the  best 
validation  of  die  constructive  theory  is  the  observable  effectiveness  of  the  synthesized  results. 

Several  years  ago,  we  began  work  on  a  new  text  generation  system,  named  Penman,  designed 
to  write  texts  a  few  paragraphs  long.  A  previous  round  of  research  had  led  to  a  computer  program 
which  was  able  to  write  a  limited  range  of  two-paragraph  texts,  but  which  had  several  serious 
shortcomings,  especially  in  the  narrow  rigidity  of  Ha  grammar. 

We  therefore  wanted  to  include  in  Penman  a  significant,  linguistically  justified  grammar.  Now, 
several  years  later,  we  have  such  a  grammar,  named  Nigel  after  Hateda/a  learner  [HaiUday  75].  This 
paper  passes  over  the  parts  of  Penman  devoted  to  invention,  to  text  planning  and  to  retrospective 
improvement  of  the  text,  and  concentrates  entirely  on  Nigel,  tee  grammar. 


3.  Nigel:  Penman’s  Grammar 

The  grammatical  framework  of  Penman  is  tee  Systemic  framework,  begun  by  Michael  HaiUday 
in  the  late  1950’s.1  It  draws  on  a  wide  range  of  prior  work,  including  [HalMay  76,  Hudson  78,  HaMday 
ft  Martin  81,  Fawcett  80,  Berry  75,  Berry  77,  Hallktay  ft  Hasan  76]  and  others.  In  order  to  reconcile  the 
various  fragments  of  grammar  and  to  augment  teem,  all  of  the  prior  work  has  bean  altered  in  some 
way,  sometimes  by  a  simple  notational  shift,  and  sometimes  by  thorough  re-representation. 

I  will  describe  Nigel  in  a  series  of  stages,  in  effect  working  backward  through  the  generation 
process  from  the  level  of  lexical  items  to  the  level  of  conditions  in  Nigel’s  environment  which  affect 
the  particular  text  produced.  The  whole  discussion  win  be  about  Nigefs  role  in  generating  single 
sentences,  because  Penman  is  designed  to  plan  text  down  to  the  sentence  level,  including  the 
relations  teat  each  sentence  will  express,  and  then  to  have  Nigel  execute  each  sentence  plan 
independently. 


3.1  Lexicon 

Nigel's  lexicon  is  deliberately  oversimplified,  because  we  felt  that  tee  limiting  technical 

prootemg  wefe  enewnere#  ix  w  8  reXicon  w  ino«pvnQVm  ioxvcbi  wtxnotjx  moipinffOQir  in® 

lexicon  is  weft  elaborated  for  lexical  features:  there  a re  over  100  distinct  lexical  features,  and  tee 
lexicon  has  Name  representing  over  800  distinct  combinations  of  lexical  features.  However,  tease 
ffourea  are  not  pardcufarty  significant,  since  the  iexlcon  has  not  been  extensively  developed  or  tested. 
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3.2  Realization 

Nigel  bunds  syntactic  structures  by  a  set  of  activities  usually  called  "realization"  In  the 
systemic  framework.  They  are  distinct  from  activities  which  specify  the  characteristics  of  a  syntactic 
unit,  formally  termed  grammatical  features.  Each  syntactic  unit  is  first  developed  as  a  set  of 
grammatical  features,  which  realization  converts  to  a  syntactic  structure.  AH  of  the  control  over  e4iet 
is  built  is  exercised  during  the  creation  of  the  feature  set;  there  is  no  optionality  or  syntactic  variability 
in  realization. 


or  more  realization  statement*  associated  with  it,  each  consisting  of  a  realization  operator  and 
a  number  of  operands.  Each  realization  statement  makes  some  change  or  introduces  some 
restriction  on  the  structure  being  produced. 

There  are  three  groups  of  realization  operators:  those  that  build  structure  0"  terms  of 
grammatical  functions),  those  that  constrain  order,  and  those  that  associate  features  with 
grammatical  functions. 

1.  The  realization  operators  which  build  structure  are  Insert,  Conflate,  and  Expand.  By 
repeated  use  of  the  structure-building  functions,  the  grammar  is  able  to  construct  sets  of 
function  bundle*,  also  called  fundlea.  None  of  these  operators  are  new  to  the 
systemic  framework. 

2.  Realization  operators  which  constrain  order  are  Partition,  Order,  OrderAtFront,  and 
OrderAtEnd.  Partition  constrains  one  function  (hence  one  fundle)  to  be  realized  to  the 
left  of  another,  but  does  not  constrain  them  to  be  adlacenb  Order  constrains  Just  as 
Partition  does,  and  In  addition  constrains  the  two  to  be  realized  adjacently.  OrderAtFront 
constrains  a  function  to  be  realized  as  the  leftmost  among  the  daughters  of  its  mother, 
and  OrderAtEnd  symmetrically  as  the  rightmost.  Of  these,  only  Partition  is  new  to  the 
systemic  framework. 

3.  Realization  operators  that  associate  features  with  functions  are  Preselect,  which 
associates  a  grammatical  feature  with  a  function  (and  hence  with  its  fundle);  Classify, 
which  asaocirttee  a  lexical  feature  with  a  function;  OutCiaaalfy,  which  assodetsa  a 
lexical  feature  with  a  function  in  a  preventive  way;  mid  Laxity,  which  foresee  particular 
lexical  item  to  be  used  to  realize  a  function.  Of  these,  OutClassify  and  LexWy  are  new, 
taking  up  roles  previously  filled  by  Classify.  OutClassify  restricts  the  realization  of  a 
function  (and  hence  fundle)  to  be  a  lexical  item  which  does  not  bear  the  named  feature. 

This  is  useful  for  controlling  items  In  exception  categories  (e.g.,  reflexives)  in  a  localized, 
manageable  way.  Laxity  allows  the  grammar  to  force  selection  of  a  particular  hem 
without  having  a  special  lexical  feature  for  that  purpose,  it  is  Preselect  which  makes  the 
grammar  recursive,  since  Preselect  requires  choosing  a  particular  grammatical  feature  in 
a  lower-ranked  pass  through  the  grammar. 

In  addition  to  these  realization  operators,  there  is  a  set  of  Default  Function  Order  Lists. 
These  ere  lists  of  function,  which  will  be  ordered  in  particular  ways  by  Nigel,  provided  that  the 
fcmedons  on  the  Nets  occur  in  the  structure  and  that  the  realization  operators  have  not  already 
ordered  ttwee  functions.  A  large  proportion  of  the  constraint  of  order  is  performed  through  die  use  of 

dress  dels. 
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systemic  ordering  is  in  fact  a  fairly  complex  matter.  The  (as  yet  unpublished)  ordering  algorithms  of 
Nigel  constitute  a  definite  and  testable  proposal  for  the  meanings  of  the  realization  operators  for 
ordering. 

3.3  Choice:  Systems  and  Gates 

Nigel  has  systems  of  alternatives,  called  systems  as  in  the  systemic  tradition  (to  the 
confusion  of  the  computational  tradition.)  The  alternatives  are  grammatical  features.  Each  system 
also  has  an  entry  condition,  a  logical  expression  of  grammatical  features.  The  entry  condition  must 
be  satisfied  in  order  to  enter  the  system,  i.e.,  to  have  the  set  of  alternatives  available  for  choice.  When 
a  system  has  been  entered,  one  of  the  alternative  features  must  be  chosen. 

In  addition  to  the  systems  there  are  Oates.  A  gate  can  be  thought  of  as  an  entry  condition 
which  activates  a  particular  grammatical  feature,  without  choice.  These  grammatical  features  are 
used  just  ss  those  chosen  in  systems.  Gates  are  most  often  used  to  provide  a  feature  to  be  realized, 
in  response  to  a  collection  of  features.2 


3.4  Choosing 


The  systemic  literature  has  many  discussions  of  the  oppositions  of  language,  the  direct 
alternations  represented  in  systems.  There  is  much  less  discussion  of  which  alternative  is  most 
suitable  in  particular  cases.  Of  course  for  text  generation,  making  good  individual  choices  is  an 
essential  activity,  and  so  there  must  be  some  representation  of  how  choices  in  the  grammar  are  to  be 
made.3 


In  order  to  specify  explicitly  how  choices  are  made,  a  new  definitional  stratum  has  been  added 
to  systemic  notation.  For  each  system,  a  formally  defined  process  called  a  chooser  or  choice 
expert  is  created.  Each  such  process  consists  of  stsps,  potentially  of  several  kinds.  The  principal 
kinds  of  steps  are  information  gathering,  discrimination  between  kinds  of  conditions,  and  choice. 
When  a  system  is  entered,  the  corresponding  chooesr  process  is  executed,  yeilding  a  choice  among 
the  system’s  alternatives. 

By  defining  choosers  in  this  way,  we  can  make  explicit  what  particular  choices  depend  upon, 
and  we  can  examine  whether  particular  natural  examples  conform  to  the  conditions  of  choice  which 
have  been  defined. 

The  activity  of  defining  choosers  often  reveals  regularities  (or  irregularities)  which  the 
grammar  does  not  represent  Several  choosers  may  depend  in  the  same  way  on  the  same 
determinative  condition.  Or  a  notion  such  as  markedness  may  turn  out  to  represent  vary  different 
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conditions  in  to  various  grammatical  system*.  Defining  chooesrs  typically  toads  to  local  refinement  of 
the  grammar,  along  with  strengthened  juaMMeadon  for  the  partteuMr  form  used. 


3.5  Inquiry 

It  would  be  possible  to  allow  choppers  to  have  some  sort  of  unrestricted  access  to  the 
knowledge  which  surrounds  them,  but  tola  would  ba  unsatisfactory  as  theory  and  unmanageable  as  a 
practical  text  generation  resource.  Instead,  Ntgol  has  a  very  staple,  highly  restricted  method  for 
choosers  to  gain  access  to  the  information  they  need.  Choosers  gain  Information  to  guide  their  work 
only  by  issuing  Inquiries  staled  in  a  simple  Inquiry  language. 

The  boundary  of  the  grammar  separatee  two  needy  independent  symbol  systems.  Outside  of 
the  boundary.  In  the  environment,  there  is  knowledge  of  what  needs  to  be  said,  including  both 
general  knowledge  and  the  text  plan.  Inside  the  boundary  are  grammatical  features,  grammaticr 
function  symbols,  chooser  definitions,  system  and  gate  definitions,  and  realization  statements.  All  c 
these  are  beyond  the  reach  of  the  environment  and  cannot  ba  designated  for  manipulation  by  it 
Conversely,  the  symbol  system  outside  of  the  environment  is  not  dhectly  available  to  the  grammar 
When  the  grammar  needs  a  symbol  to  use  in  some  later  inquiry,  such  as  a  designation  of  the  agent  c 
a  process  so  that  it  can  inquire  whether  the  agent  is  multiple,  it  sake  for  a  temporary  symbol  for  M 
purpose.  These  symbols  are  discarded  once  the  unit  has  been  bust,  and  the  separation  of  symbol 
systems  is  thereby  maintained. 

(Lexical  Hems  are  exceptions  to  these  remarks  about  symbol  system  separation.  The 
choosers  assume  that  associations  are  maintained  between  the  relevant  knowledge  and  lexical  hems, 
so  that,  for  example,  the  grammar  can  slfdt  a  set  of  denotationafty  appropriate  terms  as  candidates  to 
serve  as  the  head  of  a  nominal  group.) 

This  way  of  defining  and  using  an  inquiry  language  has  important  practical  and  theoretical 
benefits.  R  permits  development  of  tea  grammar  and  Us  semantics  while  avoiding  two  traps: 

1.  defining  ths  grammar’s  semantics  in  terms  of  particular  conventions  of  knowledge 
representation; 

2.  defining  the  grammar’s  semantics  relative  to  particular  syntactic  structures,  rather  than  to 

IhSmUamaI  InuukaA 

umm  TUndont!  HTtpon. 

Avoiding  the  first  of  these  traps  is  perticuiarty  important  If  Nigel  la  to  bo  used  in  other  artificial 
intoWgonoo  moo each  projects.  Sines  Nigel  can  bo  independent  of  particular  knowledge 
iwmMmii  iif  np*)  pniyiwiG  owns  iraoB  in  nnoww jyo  reprinnimn  wib  noi 


3.6  Environment 


bm  Ika  Aa^M^aeaad  ■  Ilia 
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boundary,  fiooatafiaof  two  radtordfilarsmooflacrtoni  of  Information.  Informafiy,  thsyars 
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2.  the  Text  Plan:  information  which  is  created  in  response  to  that  demand. 

Nigel  leans  heavily  on  both.  It  presumes  that  the  text  plan  contains  definite  intentions  about 
the  ideational,  interpersonal,  and  textual  functions  of  the  unit  being  generated;  much  of  the  ideational 
information  comes  from  the  knowledge  base. 


4.  The  Inquiry  Stratum  as  a  Semantics 

Although  definitions  differ  widely,  the  term  "semantics"  is  usually  used  to  represent  some  sort 
of  specification  of  correspondence  between  elements  of  a  linguistic  system  and  elements  of  another 
system  distinct  from  it  Taken  this  way,  there  are  two  senses  in  which  the  inquiries  of  Nigel  constitute 
a  semantics. 

First,  given  a  grammar  of  a  language,  including  choosers,  the  collection  of  inquiry  operators 
used  in  the  choosers  constitutes  a  specification  of  what  can  be  expressed  in  syntactic  structure.  For 
example,  multiplicity,  intention  to  emphasize,  and  time  precedence  are  identifiably  expressed  in 
Nigel’s  grammar  of  English.  And  we  can  also  say,  on  the  basis  of  the  collection  of  inquiry  operators, 
that  English  tense  is  indifferent  to  the  contrast  between  moments  and  intervals.  In  this  sense,  the 
collection  of  inquiry  operators  provides  a  semantics  of  the  collection  of  syntactic  structures. 

In  the  second  sense,  given  the  particular  choosers,  systems,  and  realization  statements  of  a 
grammar,  we  can  construct  the  mapping  from  particular  conditions,  i.e.,  particular  collections  of 
environmental  responses  to  inquiries,  to  strings  of  symbols  which  they  yield.  This  is  a  semantics  of 
the  grammar  of  particular  utterances. 

Note  that,  in  both  cases,  a  semantics  of  die  grammar  is  specified,  rather  than  a  semantics  of 
the  language  as  a  whole.  Lexical  aspects  are  specified  in  only  a  very  rudimentary  way,  and  the 
semantics  above  the  level  of  the  largest  grammatical  unit  is  likewise  only  slightly  constrained.  T  .ese 
limitations  can  be  regarded  as  advantages,  because  they  provide  a  principled  factoring  of  a  very 
complex  field  of  inquiry. 


5.  State  of  Development 

The  generation  mechanisms  of  Nigel  have  been  programmed  and  tested.  Choosers  for  about 
two  thirds  of  its  200-odd  systems  have  been  defined.  Whenever  a  new  cluster  of  choosers  is  defined, 
there  is  an  inevitable  reexamination  of  the  systems  of  that  region,  and  of  their  justification.  As  a 
result,  Nigel  as  a  whole  is  evolving  toward  an  increasingly  homc-jeneous  grammar  in  a  fairly 
consistent  definitional  style. 

When  there  are  choosers  for  all  of  Nigel's  systems,  many  new  tests  will  become  possible. 
They  wiN  Involve  generating  units  on  demand,  attempting  to  imitate  natural  examples,  and 
characterizing  syntactic  units  by  the  demands  for  which  they  were  produced.  Such  tests,  while  vital 
and  informative,  are  local  to  the  grammar.  They  cannot  show  how  adequate  the  grammar  la  as  an 
element  of  a  text  generator. 
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We  look  forward  to  eventually  mating  Nigel  with  a  programmed  text  planner,  and  later  with 
programmed  search  processes  (for  invention)  and  text  improvement  processes  as  well.  Only  then 
can  Penman  be  tested  as  a  synthetic  vaccine  is  tested- -by  judging  its  operational  resuits.  Seeing  the 
products  of  other  text  generators,  we  expect  that  Penman  will  eventually  generate  very  high  quality 
text,  and  that  the  process  of  defining  the  generator  will  be  filled  with  exciting  and  informative 
research.4 


4  Additional  information  on  panicuMr  aspects  of  tw  work  can  ba  found  In  related  reports  and  pubHcattone:  Panman’a 
daalfln:  (Mann  S3a],  Chooser  detWon:  [Mann  82),  Mgers  protases*  [Mann  t  Matthiassen  83b),  Inquiry  esrwanUcs:  [Mann 
83b).  Extandad  enamplsi  of  Nlgera  operation:  [Mann  8  MaHhlsaaan  83a). 
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